|
|
Name Disambiguation Based on Heterogeneous Network Representation Learning |
TANG Zhengzheng1,2, HONG Xuehai2,3, WANG Yang1,2, LI Yuxuan1,2 |
1. Center of Information Development Strategy and Evaluation, Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190 2. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049 3. Strategy Research Center of Information Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190 |
|
|
Abstract During the search for the name of an author in the system, the return of all documents of the author deteriorates the user experience. Name disambiguation can improve the retrieval accuracy. Therefore, a name disambiguation method based on heterogeneous network representation learning is proposed. Firstly, a paper heterogeneous network is constructed for each ambiguous name. Then, the representation vector of each paper node in the network is obtained based on the heterogeneous network and the Word2Vec. Finally, papers are divided up and assigned to different author entities via rule matching and a clustering method based on density with noise. The proposed method generates better performance on OAG-WholsWho competition dataset, and its effectiveness is verified.
|
Received: 08 March 2021
|
|
Fund:National Natural Science Foundation of China(No.92046017), Information Engineering Project of Chinese Academy of Sciences(No.XXH13504-03) |
Corresponding Authors:
HONG Xuehai, Ph.D., professor. His research interests include high performance computing, big data and cloud computing, and artificial intelligence.
|
About author:: TANG Zhengzheng, Ph.D. candidate. His research interests include machine learning, data mining and graph representation lear-ning. WANG Yang, Ph.D., senior engineer. His research interests include informatization development strategy research, big data analysis and situational awareness system. LI yuxuan, master student. His research interests include machine learning and information retrieval. |
|
|
|
[1] ZHANG Y T, ZHANG F J, YAO P R, et al. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop // Proc of the 24th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining. New York, USA: ACM, 2018: 1002-1011. [2] TRAN H N, HUYNH T, DO T. Author Name Disambiguation by Using Deep Neural Network // Proc of the Asian Conference on Intelligent Information and Database Systems. Berlin, Germany: Sprin-ger, 2014: 123-132. [3] QIAO Z Y, DU Y, FU Y J, et al. Unsupervised Author Disambiguation Using Heterogeneous Graph Convolutional Network Embedding // Proc of the IEEE International Conference on Big Data. Washington, USA: IEEE, 2019: 910-919. [4] FAN X M, WANG J Y, PU X, et al. On Graph-Based Name Disambiguation. Journal of Data and Information Quality, 2011, 2(2). DOI: 10.1145/1891879.1891883. [5] XU J, SHEN S Q, LI D S, et al. A Network-Embedding Based Method for Author Disambiguation // Proc of the 27th ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2018: 1735-1738. [6] ZHANG B C, Al HASAN M. Name Disambiguation in Anonymized Graphs Using Network Embedding // Proc of the 26th ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2017: 1239-1248. [7] SANTANA A F, GONGALVES M A, LAENDER A H F, et al. On the Combination of Domain-Specific Heuristics for Author Name Disambiguation: The Nearest Cluster Method. International Journal on Digital Libraries, 2015, 16(3/4): 229-246. [8] CHIN W S, ZHUANG Y, JUAN Y C, et al. Effective String Processing and Matching for Author Disambiguation. Journal of Machine Learning Research, 2014, 15: 3037-3064. [9] TANG J, FONG A C M, WANG B, et al. A Unified Probabilistic Framework for Name Disambiguation in Digital Library. IEEE Tran-sactions on Knowledge and Data Engineering, 2011, 24(6): 975-987. [10] LIN X Q, ZHU J, TANG Y, et al. A Novel Approach for Author Name Disambiguation Using Ranking Confidence // Proc of the International Conference on Database Systems for Advanced Applications. Berlin, Germany: Springer, 2017: 169-182. [11] QIAN Y N, ZHENG Q H, SAKAI T, et al. Dynamic Author Name Disambiguation for Growing Digital Libraries. Information Retrieval Journal, 2015, 18(5): 379-412. [12] HAN H, GILES L, ZHA H Y, et al. Two Supervised Learning Approaches for Name Disambiguation in Author Citations // Proc of the Joint ACM/IEEE Conference on Digital Libraries. Washington, USA: IEEE, 2004: 296-305. [13] CUCERZAN S. Large-Scale Named Entity Disambiguation Based on Wikipedia Data // Proc of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Stroudsburg, USA: ACL, 2007: 708-716. [14] WANG X Z, TANG J, CHENG H, et al. ADANA: Active Name Disambiguation // Proc of the 11th IEEE International Conference on Data Mining. Washington, USA: IEEE, 2011: 794-803. [15] ZHAO Z Q, ROLLINS J, BAI L G, et al. Incremental Author Name Disambiguation for Scientific Citation Data // Proc of the IEEE International Conference on Data Science and Advanced Analytics. Washington, USA: IEEE, 2017: 175-183. [16] SANTANA A F, GONCALVES M A, LAENDER A H, et al. Incremental Author Name Disambiguation by Exploiting Domain-Specific Heuristics. Journal of the Association for Information Science and Technology, 2017, 68(4): 931-945. [17] PEROZZI B, Al-RFOU R, SKIENA S. Deepwalk: Online Lear-ning of Social Representations // Proc of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2014: 701-710. [18] GROVER A, LESKOVEC J. node2vec: Scalable Feature Learning for Networks // Proc of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2016: 855-864. [19] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[C/OL]. [2021-01-25]. https://arxiv.org/pdf/1301.3781v3.pdf. [20] RONG X. Word2vec Parameter Learning Explained[C/OL]. [2021-01-25]. https://arxiv.org/pdf/1411.2738v3.pdf. [21] SUN Y Z, HAN J W, YAN X F, et al. PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks. Proceedings of the VLDB Endowment, 2011, 4(11): 992-1003. [22] TANF J, QU M, WANG M Z, et al. Line: Large-Scale Information Network Embedding // Proc of the 24th International Confe-rence on World Wide Web. New York, USA: ACM, 2015: 1067-1077. [23] CAO S S, LU W, XU Q K. Deep Neural Networks for Learning Graph Representations // Proc of the 30th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2016: 1145-1152. [24] DONG Y X, CHAWLA N V, SWAMI A. metapath2vec: Scalable Representation Learning for Heterogeneous Networks // Proc of the 23rd ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining. New York, USA: ACM, 2017: 135-144. [25] KIPF T N, WELLING M. Semi-supervised Classification with Graph Convolutional Networks[C/OL]. [2021-01-25].https://arxiv.org/pdf/1609.02907.pdf. [26] HAMILTON W L, YING R, LESKOVEC J. Inductive Representation Learning on Large Graphs // Proc of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM, 2017: 1025-1035. [27] ZHANG C X, SONG D J, HUANG C, et al. Heterogeneous Graph Neural Network // Proc of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2019: 793-803. [28] SCHLICHTKRULL M, KIPF T N, BLOEM P, et al. Modeling Relational Data with Graph Convolutional Networks // Proc of the European Semantic Web Conference. Berlin, Germany: Springer, 2018: 593-607. [29] ZHANG W J, YAN Z M, ZHENG Y Q. Author Name Disambiguation Using Graph Node Embedding Method // Proc of the 23rd IEEE International Conference on Computer Supported Cooperative Work in Design. Washington, USA: IEEE, 2019: 410-415. [30] ROKACH L, MAIMON O. Clustering Methods // MAIMON O, ROKACH L, eds. Data Mining and Knowledge Discovery Handbook. Berlin, Germany: Springer, 2005: 321-352. [31] HUSSAIN I, ASGHAR S. A Survey of Author Name Disambiguation Techniques: 2010-2016. The Knowledge Engineering Review, 2017, 32. DOI: 10.1017/S0269888917000182. [32] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed Representations of Words and Phrases and Their Compositionality // Proc of the 26th International Conference on Neural Information Proce-ssing Systems. New York, USA: ACM, 2013, II: 3111-3119. [33] LE Q, MIKOLOV T. Distributed Representations of Sentences and Documents // Proc of the 31st International Conference on Machine Learning. New York, USA: ACM, 2014, II: 1188-1196. [34] HOU C B, ZHANG H, TANG K, et al. DynWalks: Global Topo-logy and Recent Changes Awareness Dynamic Network Embedding[C/OL]. [2021-01-25]. https://arxiv.org/pdf/1907.11968v1.pdf. [35] CAO S S, LU W, XU Q K. GraRep: Learning Graph Representations with Global Structural Information // Proc of the 24th ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2015: 891-900. |
|
|
|